Disease Gene Identification
   HOME

TheInfoList



OR:

Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited
genetic disorder A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.


Significance

Knowledge of which genes (when non-functional) cause which disorders will simplify diagnosis of patients and provide insights into the functional characteristics of the mutation. The advent of modern-day
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
technologies combined with insights provided from the growing field of genomics is resulting in more rapid disease gene identification, thus allowing scientists to identify more complex mutations.


Generic gene identification procedure

Disease gene identification techniques often follow the same overall procedure. DNA is first collected from several patients who are believed to have the same genetic disease. Then, their DNA samples are analyzed and screened to determine probable regions where the mutation could potentially reside. These techniques are mentioned below. These probable regions are then lined-up with one another and the overlapping region should contain the mutant gene. If enough of the genome sequence is known, that region is searched for
candidate gene The candidate gene approach to conducting genetic association studies focuses on associations between genetic variation within pre-specified genes of interest, and phenotypes or disease states. This is in contrast to genome-wide association studies ...
s. Coding regions of these genes are then sequenced until a mutation is discovered or another patient is discovered, in which case the analysis can be repeated, potentially narrowing down the region of interest. The differences between most disease gene identification procedures are in the second step (where DNA samples are analyzed and screened to determine regions in which the mutation could reside).


Pre-genomics techniques

Without the aid of the whole-genome sequences, pre-genomics investigations looked at select regions of the genome, often with only minimal knowledge of the gene sequences they were looking at. Genetic techniques capable of providing this sort of information include Restriction Fragment Length Polymorphism (RFLP) analysis and microsatellite analysis.


Loss of heterozygosity (LOH)

Loss of heterozygosity (LOH) is a technique that can only be used to compare two samples from the same individual. LOH analysis is often used when identifying cancer-causing
oncogenes An oncogene is a gene that has the potential to cause cancer. In tumor cells, these genes are often mutated, or expressed at high levels.
in that one sample consists of (mutant) tumor DNA and the other (control) sample consists of genomic DNA from non-cancerous cells from the same individual. RFLPs and microsatellite markers provide patterns of DNA polymorphisms, which can be interpreted as residing in a
heterozygous Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mo ...
region or a
homozygous Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mo ...
region of the genome. Provided that all individuals are affected with the same disease resulting from a manifestation of a deletion of a single copy of the same gene, all individuals will contain one region where their control sample is heterozygous but the mutant sample is homozygous - this region will contain the disease gene.


Post-genomics techniques

With the advent of modern laboratory techniques such as
High-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. Th ...
and software capable of genome-wide analysis, sequence acquisition has become increasingly less expensive and time-consuming, thus providing significant benefits to science in the form of more efficient disease gene identification techniques.


Identity by descent mapping

Identity by descent A DNA segment is identical by state (IBS) in two or more individuals if they have identical nucleotide sequences in this segment. An IBS segment is identical by descent (IBD) in two or more individuals if they have inherited it from a common a ...
(IBD) mapping generally uses single nucleotide polymorphism (SNP) arrays to survey known polymorphic sites throughout the genome of affected individuals and their parents and/or siblings, both affected and unaffected. While these SNPs probably do not cause the disease, they provide valuable insight into the makeup of the genomes in question. A region of the genome is considered identical by descent if contiguous SNPs share the same genotype. When comparing an affected individual to his/her affected sibling, all identical regions are recorded (ex. Shaded in red in above figure). Given that an affected sibling and an unaffected sibling do not have the same disease phenotype, their DNA must by definition be different (barring the presence of a genetic or environmental
modifier Modifier may refer to: * Grammatical modifier, a word that modifies the meaning of another word or limits its meaning ** Compound modifier, two or more words that modify a noun ** Dangling modifier, a word or phrase that modifies a clause in an am ...
). Thus, the IBD mapping results can be further supplemented by removing any regions that are identical in both affected individuals and unaffected siblings. This is then repeated for multiple families, thus generating a small, overlapping fragment, which theoretically contains the disease gene.


Homozygosity/autozygosity mapping

Homozygosity/Autozygosity mapping is a powerful technique, but is only valid when searching for a mutation segregating within a small, closed population. Such a small population, possibly created by the founder effect, will have a limited gene pool, and thus any inherited disease will probably be a result of two copies of the same mutation segregating on the same haplotype. Since affected individuals will probably be homozygous in the regions, looking at SNPs in a region is an adequate marker of regions of homozygosity and heterozygosity. Modern day
SNP array In molecular biology, SNP array is a type of DNA microarray which is used to detect polymorphisms within a population. A single nucleotide polymorphism (SNP), a variation at a single site in DNA, is the most frequent type of variation in the ge ...
s are used to survey the genome and identify large regions of homozygosity. Homozygous blocks in the genomes of affected individuals can then be laid on top of each other, and the overlapping region should contain the disease gene. This analysis is often extended by analyzing autozygosity, an extension of homozygosity, in the genomes of affected individuals. This can be accomplished by plotting a cumulative LOD score alongside the overlaid blocks of homozygosity. By taking into consideration the population allele frequencies for all SNPs via autozygosity mapping, the results of homozygosity can be confirmed. Furthermore, if two suspicious regions appear as a result of homozygosity mapping, autozygosity mapping may be able to distinguish between the two (ex. If one block of homozygosity is a result of a very non-diverse region of the genome, the LOD score will be very low). Tools for Homozygosity Mapping # HomSI: a homozygous stretch identifier from next-generation sequencing data A tool that identifies homozygous regions using deep sequence data.


Genome-wide knockdown studies

Genome-wide knockdown studies are an example of the
reverse genetics Reverse genetics is a method in molecular genetics that is used to help understand the function(s) of a gene by analysing the phenotypic effects caused by genetically engineering specific nucleic acid sequences within the gene. The process pr ...
made possible by the acquisition of whole genome sequences, and the advent of genomics and gene-silencing technologies, mainly
siRNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA at first non-coding RNA molecules, typically 20-24 (normally 21) base pairs in length, similar to miRNA, and operating ...
and deletion mapping. Genome-wide knockdown studies involve systematic knockdown or deletion of genes or segments of the genome. This is generally done in prokaryotes or in a
tissue culture Tissue culture is the growth of tissues or cells in an artificial medium separate from the parent organism. This technique is also called micropropagation. This is typically facilitated via use of a liquid, semi-solid, or solid growth medium, su ...
environment due to the massive number of knockdowns that must be performed. After the systematic knockout is completed (and possibly confirmed by mRNA expression analysis), the phenotypic results of the knockdown/knockout can be observed. Observation parameters can be selected to target a highly specific phenotype. The resulting dataset is then queried for samples which exhibit phenotypes matching the disease in question – the gene(s) knocked down/out in said samples can then be considered candidate disease genes for the individual in question.


Whole exome sequencing

Whole
exome sequencing Exome sequencing, also known as whole exome sequencing (WES), is a genomic technique for sequencing all of the protein-coding regions of genes in a genome (known as the exome). It consists of two steps: the first step is to select only the subs ...
is a brute-force approach that involves using modern day sequencing technology and DNA sequence assembly tools to piece together all coding portions of the genome. The sequence is then compared to a
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the set of genes in one idealized individual organism of a species. As they are assemble ...
and any differences are noted. After filtering out all known benign polymorphisms, synonymous changes, and
intronic An intron is any nucleotide sequence within a gene that is not expressed or operative in the final RNA product. The word ''intron'' is derived from the term ''intragenic region'', i.e. a region inside a gene."The notion of the cistron .e., gene ...
changes (that do not affect splice sites), only potentially pathogenic variants will be left. This technique can be combined with other techniques to further exclude potentially pathogenic variants should more than one be identified.


See also

* Gene Disease Database * Gene identification * Haplotype tagging


References

{{Reflist, colwidth=35em Mutation Genomics Molecular biology Bioinformatics